This document provides the code and output from Time Series I class. This is a living document and may be updated throughout the semester (when this occurs, you will be notified that it has been updated). First, here is a list of all the libraries that you will need:
library(tseries)
library(forecast)
library(haven)
library(fma)
library(expsmooth)
library(lmtest)
library(zoo)
library(seasonal)
library(ggplot2)
library(seasonalview)
library(aTSA)
library(imputeTS)
library(prophet)
The data sets that you will need are as follows (be sure to put the correct location and file name for your computer):
#file.dir = "..."
input.file1 = "usairlines.csv"
input.file2 = "steel.csv"
input.file3 = "leadyear.csv"
input.file4 = "ebay9899.csv"
input.file5 = "fpp_insurance.csv"
input.file6 = "ar2.csv"
input.file7 = "MA2.csv"
input.file8 = "hurrican.csv"
# Reads the data at specified directory
# If the file directory is incorrect, then this won't run
USAirlines = read.table(paste(file.dir, input.file1,sep = ""),sep=",", header=T)
Steel = read.table(paste(file.dir, input.file2, sep = ""),sep=",",header=T)
Lead.Year = read.table(paste(file.dir, input.file3, sep = ""),sep=",",header=T)
Ebay = read.table(paste(file.dir, input.file4, sep = ""),sep=",",header=T)
Quotes= read.table(paste(file.dir, input.file5, sep = ""),sep=",",header=T)
Y= read.table(paste(file.dir, input.file6, sep = ""),sep=",",header=T)
x=read.table(paste(file.dir, input.file7, sep = ""),sep=",",header=T)
hurricane=read.table(paste(file.dir, input.file8, sep = ""),sep=",",header=T)
For many time series applications, you will need a time series object in R. This is created using the function ts. For example, the time series data set in the airlines data frame is in the column “passengers”. Let’s go ahead and create the time series object for this data set and graph it.
Passenger <- ts(USAirlines$Passengers, start = 1990, frequency =12)
autoplot(Passenger)+labs(title="Time Series plot for Passengers", x="Date",y="Passengers")
Within the ts command, the only required argument is the vector of data that contains the time series information (in this case USAirlines$Passengers). The optional arguments of “start” is for nice plotting purposes (it has the correct time frame when it plots instead of just using 1,2,3 for time). The last argument shown is “frequency”, which is for seasonal data. If your time series object has a seasonality to it, then you should specify the length of the season (it does not know this unless you provide it). For furture analysis, we will need to create the time series objects for Steel.
SteelShp <- ts(Steel$steelshp, start = 1984, frequency = 12)
IF your time series has a seasonal component to it, a useful visualization is the decomposition. We will be using the STL decomposition (which can only do the additive decomposition, NOT multiplicative!). The following code creates the decomposition and then plots it:
# Time Series Decomposition ...STL#
decomp_stl <- stl(Passenger, s.window = 7)
# Plot the individual components of the time series
plot(decomp_stl)
autoplot(decomp_stl)
You can pull off the different components (Seasonal, Trend or Remainder).
head(decomp_stl$time.series)
## seasonal trend remainder
## Jan 1990 -4526.7610 39081.77 -207.0131
## Feb 1990 -5827.5592 38942.75 420.8128
## Mar 1990 560.4986 38803.72 1213.7829
## Apr 1990 -802.2312 38664.69 404.5406
## May 1990 139.2095 38533.15 -423.3574
## Jun 1990 2953.8857 38401.61 -563.4910
Which means we can overlay the original data with the trend component (which is the second column.)
autoplot(Passenger)+geom_line(aes(y=decomp_stl$time.series[,2]),color="blue")
Notice that the trend component is VERY similar to the “seasonally adjusted” data! Do you know what the difference between the two series is?
seas_adj=Passenger-decomp_stl$time.series[,1]
autoplot(Passenger) +
geom_line(aes(y=decomp_stl$time.series[,2]),color="blue") +
geom_line(aes(y=seas_adj),color="orange")
Another interesting plot is the subseries plot. This looks at the individual series (in this case, the series for January, the series for February, etc….).
# Plot seasonal subseries by months
ggsubseriesplot(Passenger)
Just a quick note. STL ONLY does additive seasonal decomposition. There is a decompose library that will do both additive AND multiplicative decomposition.
Here is something to get you started if you want to take a look at the X13 decomposition!
decomp_x13=seas(Passenger)
summary(decomp_x13)
##
## Call:
## seas(x = Passenger)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Leap Year 1.610e+03 4.003e+02 4.021 5.79e-05 ***
## Weekday 5.586e+01 1.890e+01 2.956 0.00312 **
## LS1992.Jun 4.006e+03 8.439e+02 4.747 2.07e-06 ***
## LS1992.Oct -3.511e+03 8.409e+02 -4.176 2.97e-05 ***
## LS2001.Sep -1.580e+04 8.633e+02 -18.296 < 2e-16 ***
## LS2001.Nov 5.492e+03 8.629e+02 6.365 1.96e-10 ***
## AO2002.Dec 4.412e+03 8.402e+02 5.251 1.51e-07 ***
## MA-Nonseasonal-01 5.073e-01 5.991e-02 8.468 < 2e-16 ***
## MA-Seasonal-12 4.981e-01 6.133e-02 8.121 4.62e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## SEATS adj. ARIMA: (0 1 1)(0 1 1) Obs.: 219 Transform: none
## AICc: 3501, BIC: 3533 QS (no seasonality in final): 0
## Box-Ljung (no autocorr.): 22.77 Shapiro (normality): 0.9914
## Messages generated by X-13:
## Warnings:
## - At least one visually significant trading day peak has been
## found in one or more of the estimated spectra.
## Neat R shiny application....run OUTSIDE of RMarkdown
#view(decomp_x13)
In R, we are able to do Simple Exponential Smoothing Models, Holt Exponential Smoothing Models and Holt-Winters Exponential Smoothing Models. For the first two (Simple and Holt), we will be using the Steel data set and for the last one (Holt-Winters), we will be using the airline data set (we will also use the airline data set to illustrate the damped trend model). Each of these are shown below.
For Simple Exponential Smoothing Models (SES), we have only one component, referred to as the level component.
This is basically a weighted average with the last observation and the last predicted value. Since this only has a level component, forecasts from SES models will be a horizontal line (hence why this method is called “one-step ahead” forecasting).
In the R code, you can choose how the initial values are selected. If you specify simple, then the first few observations will be used to estimate the starting value. If you select optimal, then the algorithm uses the ets (will be discussed later) to optimize the starting values and the smoothing parameters. You can also specify the value for “h”, which is the number of forecasts to create (take a look at the forecast…do you see a horizontal line?).
# Building a Single Exponential Smoothing (SES) Model - Steel Data #
SES.Steel <- ses(SteelShp, initial = "simple", h = 24)
summary(SES.Steel)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = SteelShp, h = 24, initial = "simple")
##
## Smoothing parameters:
## alpha = 0.4549
##
## Initial states:
## l = 5980
##
## sigma: 460.4357
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 11.43866 460.4357 363.9341 -0.2204828 5.708307 0.8287599
## ACF1
## Training set -0.04379112
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1992 6479.571 5889.499 7069.643 5577.133 7382.008
## Feb 1992 6479.571 5831.305 7127.836 5488.134 7471.007
## Mar 1992 6479.571 5777.922 7181.219 5406.492 7552.649
## Apr 1992 6479.571 5728.323 7230.819 5330.636 7628.505
## May 1992 6479.571 5681.801 7277.340 5259.487 7699.654
## Jun 1992 6479.571 5637.847 7321.295 5192.265 7766.876
## Jul 1992 6479.571 5596.077 7363.065 5128.383 7830.758
## Aug 1992 6479.571 5556.194 7402.947 5067.388 7891.754
## Sep 1992 6479.571 5517.964 7441.177 5008.920 7950.221
## Oct 1992 6479.571 5481.197 7477.944 4952.690 8006.452
## Nov 1992 6479.571 5445.736 7513.405 4898.458 8060.684
## Dec 1992 6479.571 5411.453 7547.689 4846.025 8113.116
## Jan 1993 6479.571 5378.236 7580.906 4795.224 8163.917
## Feb 1993 6479.571 5345.992 7613.150 4745.911 8213.230
## Mar 1993 6479.571 5314.640 7644.502 4697.962 8261.179
## Apr 1993 6479.571 5284.110 7675.032 4651.271 8307.871
## May 1993 6479.571 5254.340 7704.801 4605.742 8353.399
## Jun 1993 6479.571 5225.277 7733.864 4561.294 8397.847
## Jul 1993 6479.571 5196.872 7762.269 4517.853 8441.289
## Aug 1993 6479.571 5169.083 7790.058 4475.353 8483.789
## Sep 1993 6479.571 5141.871 7817.271 4433.735 8525.406
## Oct 1993 6479.571 5115.201 7843.940 4392.948 8566.194
## Nov 1993 6479.571 5089.043 7870.098 4352.942 8606.199
## Dec 1993 6479.571 5063.368 7895.773 4313.676 8645.465
# Plot the SES model on steel data
autoplot(SES.Steel)+
autolayer(fitted(SES.Steel),series="Fitted")+ylab("US Steel Shipments") + geom_vline(xintercept = 1992,color="orange",linetype="dashed")
# Computes accuracy statistics for SES model on steel data (training data...NOT validation nor test)
round(accuracy(SES.Steel),2)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 11.44 460.44 363.93 -0.22 5.71 0.83 -0.04
The Holt model incorporates trend information. So, now there are two components: level and trend. For each component, there will be a smoothing coefficient (or weight). CAREFUL, when you look at parameter estimates, these are NOT the estimates for the mean nor the linear trend…you should be thinking of them as weights (between 0 and 1). The overall form for Holt’s method is:
For the Holt’s method, when you forecast, you will see a trending line.
# Building a Linear Exponential Smoothing Model - Steel Data #
LES.Steel <- holt(SteelShp, initial = "optimal", h = 24)
summary(LES.Steel)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = SteelShp, h = 24, initial = "optimal")
##
## Smoothing parameters:
## alpha = 0.4329
## beta = 1e-04
##
## Initial states:
## l = 6678.9989
## b = -0.0651
##
## sigma: 471.4322
##
## AIC AICc BIC
## 1626.001 1626.667 1638.822
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -4.167318 461.5062 369.9177 -0.4760441 5.818476 0.8423858
## ACF1
## Training set -0.03556298
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1992 6495.463 5891.298 7099.627 5571.472 7419.453
## Feb 1992 6495.357 5836.979 7153.736 5488.455 7502.260
## Mar 1992 6495.252 5786.775 7203.730 5411.729 7578.775
## Apr 1992 6495.147 5739.865 7250.429 5340.043 7650.251
## May 1992 6495.042 5695.672 7294.412 5272.511 7717.573
## Jun 1992 6494.937 5653.768 7336.106 5208.479 7781.395
## Jul 1992 6494.832 5613.826 7375.838 5147.450 7842.214
## Aug 1992 6494.727 5575.592 7413.861 5089.032 7900.421
## Sep 1992 6494.622 5538.862 7450.381 5032.914 7956.330
## Oct 1992 6494.516 5503.468 7485.565 4978.839 8010.194
## Nov 1992 6494.411 5469.273 7519.549 4926.598 8062.225
## Dec 1992 6494.306 5436.161 7552.452 4876.013 8112.600
## Jan 1993 6494.201 5404.033 7584.369 4826.933 8161.470
## Feb 1993 6494.096 5372.805 7615.387 4779.229 8208.963
## Mar 1993 6493.991 5342.404 7645.578 4732.791 8255.191
## Apr 1993 6493.886 5312.767 7675.005 4687.520 8300.252
## May 1993 6493.781 5283.837 7703.725 4643.331 8344.230
## Jun 1993 6493.675 5255.565 7731.786 4600.149 8387.202
## Jul 1993 6493.570 5227.907 7759.234 4557.905 8429.235
## Aug 1993 6493.465 5200.824 7786.106 4516.541 8470.389
## Sep 1993 6493.360 5174.281 7812.439 4476.003 8510.718
## Oct 1993 6493.255 5148.246 7838.264 4436.241 8550.269
## Nov 1993 6493.150 5122.689 7863.611 4397.211 8589.089
## Dec 1993 6493.045 5097.585 7888.504 4358.874 8627.216
# Plote the LES model on steel data
autoplot(LES.Steel)+
autolayer(fitted(LES.Steel),series="Fitted")+labs(title="US Steel Shipment with Holt forecasts",y="US Steel Shipments") + geom_vline(xintercept = 1992,color="orange",linetype="dashed")
As mentioned, we can perform Holt’s method, however assuming we have a damped trend. You will see the formula for the damped trend is similar to the previous Holt formula with an addition of a dampening parameter.
We will illustrate the damped trend on both the Steel and Airline data sets.
LDES.Steel <- holt(SteelShp, initial = "optimal", h = 24, damped = TRUE)
summary(LDES.Steel)
##
## Forecast method: Damped Holt's method
##
## Model Information:
## Damped Holt's method
##
## Call:
## holt(y = SteelShp, h = 24, damped = TRUE, initial = "optimal")
##
## Smoothing parameters:
## alpha = 0.4202
## beta = 1e-04
## phi = 0.9083
##
## Initial states:
## l = 6692.5352
## b = -54.4181
##
## sigma: 472.702
##
## AIC AICc BIC
## 1627.468 1628.412 1642.854
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 8.669068 460.2275 367.178 -0.2689588 5.765085 0.836147 -0.02936686
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1992 6504.538 5898.746 7110.330 5578.059 7431.017
## Feb 1992 6504.482 5847.350 7161.614 5499.485 7509.479
## Mar 1992 6504.432 5799.671 7209.192 5426.594 7582.269
## Apr 1992 6504.386 5755.003 7253.768 5358.304 7650.467
## May 1992 6504.344 5712.838 7295.850 5293.839 7714.849
## Jun 1992 6504.306 5672.796 7335.817 5232.621 7775.992
## Jul 1992 6504.272 5634.585 7373.958 5174.201 7834.342
## Aug 1992 6504.241 5597.976 7410.505 5118.229 7890.252
## Sep 1992 6504.212 5562.783 7445.642 5064.420 7944.005
## Oct 1992 6504.187 5528.852 7479.521 5012.541 7995.832
## Nov 1992 6504.163 5496.057 7512.269 4962.398 8045.928
## Dec 1992 6504.142 5464.292 7543.992 4913.829 8094.455
## Jan 1993 6504.123 5433.465 7574.780 4866.693 8141.552
## Feb 1993 6504.105 5403.498 7604.712 4820.871 8187.339
## Mar 1993 6504.089 5374.322 7633.856 4776.260 8231.919
## Apr 1993 6504.075 5345.879 7662.271 4732.767 8275.383
## May 1993 6504.062 5318.115 7690.008 4690.313 8317.810
## Jun 1993 6504.050 5290.985 7717.114 4648.827 8359.272
## Jul 1993 6504.039 5264.447 7743.631 4608.247 8399.831
## Aug 1993 6504.029 5238.464 7769.594 4568.514 8439.544
## Sep 1993 6504.020 5213.002 7795.038 4529.578 8478.462
## Oct 1993 6504.012 5188.032 7819.992 4491.394 8516.630
## Nov 1993 6504.005 5163.526 7844.483 4453.919 8554.090
## Dec 1993 6503.998 5139.459 7868.537 4417.116 8590.880
autoplot(LDES.Steel)+
autolayer(fitted(LDES.Steel),series="Fitted")+labs(title="US Steel Shipment Linear Damped ESM Forecast") + geom_vline(xintercept = 1992,color="orange",linetype="dashed")
LDES.USAir <- holt(Passenger, initial = "optimal", h = 24, damped = TRUE)
summary(LDES.USAir)
##
## Forecast method: Damped Holt's method
##
## Model Information:
## Damped Holt's method
##
## Call:
## holt(y = Passenger, h = 24, damped = TRUE, initial = "optimal")
##
## Smoothing parameters:
## alpha = 0.5721
## beta = 1e-04
## phi = 0.8057
##
## Initial states:
## l = 36886.1093
## b = 527.1535
##
## sigma: 4874.3
##
## AIC AICc BIC
## 4906.527 4906.924 4926.862
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 196.2283 4818.336 3529.079 -0.2896149 7.191284 1.326257 0.04535328
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2008 63673.96 57427.30 69920.63 54120.51 73227.42
## May 2008 63674.11 56477.18 70871.04 52667.35 74680.87
## Jun 2008 63674.23 55638.45 71710.01 51384.57 75963.89
## Jul 2008 63674.33 54879.22 72469.44 50223.37 77125.29
## Aug 2008 63674.41 54180.40 73168.41 49154.58 78194.23
## Sep 2008 63674.47 53529.54 73819.40 48159.13 79189.80
## Oct 2008 63674.52 52917.92 74431.12 47223.71 80125.32
## Nov 2008 63674.56 52339.20 75009.92 46338.63 81010.49
## Dec 2008 63674.59 51788.59 75560.59 45496.53 81852.66
## Jan 2009 63674.62 51262.36 76086.88 44691.70 82657.53
## Feb 2009 63674.64 50757.52 76591.76 43919.61 83429.67
## Mar 2009 63674.66 50271.67 77077.65 43176.55 84172.76
## Apr 2009 63674.67 49802.80 77546.54 42459.48 84889.86
## May 2009 63674.68 49349.27 78000.09 41765.85 85583.51
## Jun 2009 63674.69 48909.65 78439.73 41093.51 86255.87
## Jul 2009 63674.70 48482.74 78866.66 40440.60 86908.80
## Aug 2009 63674.70 48067.49 79281.91 39805.54 87543.87
## Sep 2009 63674.71 47663.01 79686.40 39186.93 88162.48
## Oct 2009 63674.71 47268.49 80080.93 38583.57 88765.85
## Nov 2009 63674.71 46883.24 80466.19 37994.37 89355.06
## Dec 2009 63674.72 46506.63 80842.80 37418.39 89931.04
## Jan 2010 63674.72 46138.10 81211.34 36854.78 90494.66
## Feb 2010 63674.72 45777.16 81572.28 36302.76 91046.68
## Mar 2010 63674.72 45423.35 81926.09 35761.66 91587.78
autoplot(LDES.USAir)+
autolayer(fitted(LDES.USAir),series="Fitted")+labs(title="US Airline Passengers with Linear Damped ESM Forecast") + geom_vline(xintercept = 2008.25,color="orange",linetype="dashed")
The Holt-Winters (HW) model has three components to it (level, trend and seasonality). Seasonality is an interesting component to model since we can have an additive seasonal component or a multiplicative seasonal component. Both models are shown below:
Additive HW
Multiplicative HW
Where p is the frequency of the seasonality (i.e. how many “seasons” there are within one year).
# Building a Holt-Winters ESM - US Airlines Data - Additive Seasonality
HWES.USAir <- hw(Passenger, seasonal = "additive")
summary(HWES.USAir)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = Passenger, seasonal = "additive")
##
## Smoothing parameters:
## alpha = 0.5967
## beta = 1e-04
## gamma = 1e-04
##
## Initial states:
## l = 38384.1383
## b = 171.3139
## s = -1607.408 -2735.133 -266.9438 -4488.589 6259.346 6626.648
## 4166.006 1369.769 250.523 2806.828 -6741.171 -5639.874
##
## sigma: 1949.79
##
## AIC AICc BIC
## 4515.651 4518.696 4573.265
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -84.80235 1877.214 1168.093 -0.2917412 2.495749 0.4389788
## ACF1
## Training set 0.06636172
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2008 65005.38 62506.62 67504.13 61183.86 68826.90
## May 2008 66294.00 63384.09 69203.92 61843.67 70744.34
## Jun 2008 69259.75 65989.86 72529.64 64258.88 74260.62
## Jul 2008 71890.05 68295.96 75484.15 66393.36 77386.74
## Aug 2008 71692.11 67800.64 75583.59 65740.62 77643.61
## Sep 2008 61113.61 56945.83 65281.39 54739.54 67487.68
## Oct 2008 65504.72 61077.76 69931.67 58734.27 72275.16
## Nov 2008 63206.01 58534.16 67877.87 56061.02 70351.00
## Dec 2008 64502.98 59598.36 69407.60 57002.01 72003.95
## Jan 2009 60640.36 55513.46 65767.26 52799.44 68481.28
## Feb 2009 59708.43 54368.43 65048.44 51541.60 67875.27
## Mar 2009 69425.67 63880.68 74970.66 60945.34 77906.01
## Apr 2009 67038.85 61296.05 72781.64 58255.99 75821.70
## May 2009 68327.47 62393.46 74261.49 59252.18 77402.76
## Jun 2009 71293.22 65173.90 77412.54 61934.53 80651.91
## Jul 2009 73923.52 67624.29 80222.75 64289.67 83557.37
## Aug 2009 73725.58 67251.38 80199.79 63824.14 83627.03
## Sep 2009 63147.08 56502.45 69791.72 52984.99 73309.17
## Oct 2009 67538.18 60727.34 74349.03 57121.89 77954.48
## Nov 2009 65239.48 58266.32 72212.64 54574.96 75904.01
## Dec 2009 66536.45 59404.62 73668.27 55629.26 77443.63
## Jan 2010 62673.83 55386.73 69960.92 51529.18 73818.47
## Feb 2010 61741.90 54302.73 69181.07 50364.67 73119.13
## Mar 2010 71459.14 63870.89 79047.39 59853.92 83064.36
autoplot(HWES.USAir)+
autolayer(fitted(HWES.USAir),series="Fitted")+ylab("Airlines Passengers")+ geom_vline(xintercept = 2008.25,color="orange",linetype="dashed")
# Building a Holt-Winters ESM - US Airlines Data - Multiplicative Seasonality
HWES.USAir <- hw(Passenger, seasonal = "multiplicative")
summary(HWES.USAir)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = Passenger, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4372
## beta = 1e-04
## gamma = 0.2075
##
## Initial states:
## l = 38293.1221
## b = 173.9926
## s = 0.9658 0.962 1.0064 0.9745 1.1393 1.0801
## 1.0368 0.9994 1.0012 1.0401 0.8799 0.9146
##
## sigma: 0.0381
##
## AIC AICc BIC
## 4504.228 4507.272 4561.842
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -113.1889 1848.797 1090.105 -0.383246 2.303162 0.4096702 0.1713934
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2008 65528.49 62329.64 68727.34 60636.28 70420.70
## May 2008 66821.29 63262.13 70380.46 61378.02 72264.57
## Jun 2008 70075.52 66056.65 74094.40 63929.19 76221.86
## Jul 2008 73144.30 68671.62 77616.98 66303.93 79984.68
## Aug 2008 71313.29 66698.66 75927.92 64255.83 78370.76
## Sep 2008 58655.19 54662.55 62647.83 52548.97 64761.40
## Oct 2008 64544.10 59944.79 69143.41 57510.06 71578.13
## Nov 2008 62554.65 57906.98 67202.32 55446.66 69662.65
## Dec 2008 63629.25 58716.73 68541.77 56116.20 71142.30
## Jan 2009 59379.95 54629.79 64130.10 52115.21 66644.68
## Feb 2009 57852.76 53069.42 62636.10 50537.28 65168.25
## Mar 2009 70682.78 64655.56 76710.00 61464.94 79900.62
## Apr 2009 67606.25 61473.18 73739.32 58226.52 76985.97
## May 2009 68934.51 62519.04 75349.97 59122.89 78746.12
## Jun 2009 72285.87 65393.58 79178.16 61745.02 82826.72
## Jul 2009 75445.45 68084.46 82806.44 64187.79 86703.11
## Aug 2009 73551.02 66215.90 80886.14 62332.92 84769.12
## Sep 2009 60490.96 54330.83 66651.08 51069.86 69912.06
## Oct 2009 66558.97 59644.04 73473.89 55983.50 77134.43
## Nov 2009 64502.39 57671.58 71333.21 54055.56 74949.22
## Dec 2009 65605.36 58528.93 72681.80 54782.90 76427.83
## Jan 2010 61219.37 54498.44 67940.31 50940.59 71498.16
## Feb 2010 59640.30 52980.57 66300.04 49455.12 69825.49
## Mar 2010 72861.19 64590.90 81131.47 60212.88 85509.49
autoplot(HWES.USAir)+
autolayer(fitted(HWES.USAir),series="Fitted")+ylab("Airlines Passengers")+ geom_vline(xintercept = 2008.25,color="orange",linetype="dashed")
In order to get a better idea of the forecasting properties of the algorithms, it is best to divide your data into a training data set and a test data set. Time series is VERY different than other algorithms in which you have done. The test data set should come at the END of the time series (to truly see how well you can forecast!). An example code is shown below in which the last 12 observations are used as the test data set:
# Create training set from overall Airlines Data
training=subset(Passenger,end=length(Passenger)-12)
# Create test set from overall Airlines Data
test=subset(Passenger,start=length(Passenger)-11)
# Fit Holt-Winters ESM (multiplicative seasonality) on training data
HWES.USAir.train <- hw(training, seasonal = "multiplicative",initial='optimal',h=12)
# Calculate prediction errors from forecast
error=test-HWES.USAir.train$mean
# Calculate prediction error statistics (MAE and MAPE)
MAE=mean(abs(error))
MAPE=mean(abs(error)/abs(test))
MAE
## [1] 1134.58
MAPE
## [1] 0.01763593
You can also allow the computer to search for the best model. The ETS (Error, Trend, Seasonality) algorithm will search for the best model and estimate the paramaters. For the error term, we can have either an additive or multiplicative error structure. For the trend, we can have none, additive, multiplicative, damped additive or damped multiplicative . For the seasonal component, we can none, additive or multiplicative (lots of choices!). An example of how to run this is:
ets.passenger<-ets(training)
summary(ets.passenger)
## ETS(M,Ad,M)
##
## Call:
## ets(y = training)
##
## Smoothing parameters:
## alpha = 0.6485
## beta = 1e-04
## gamma = 1e-04
## phi = 0.9755
##
## Initial states:
## l = 38298.6991
## b = 99.6316
## s = 0.9696 0.9436 0.9968 0.919 1.1276 1.1289
## 1.0784 1.0223 1.0025 1.0531 0.8681 0.89
##
## sigma: 0.0369
##
## AIC AICc BIC
## 4225.929 4229.567 4285.918
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 157.1888 1747.817 1043.596 0.1980283 2.19083 0.38762 0.008869188
ets.forecast.passenger<-ets.passenger%>%forecast::forecast(h=12)
error=mean(abs(test-ets.forecast.passenger$mean))
error
## [1] 1153.042
We will now be switching over to doing ARIMA models!! There are a number of different concepts you will need in order to do this type of modeling.
Before we can try to model the dependency structure, we must first have a stationary distribution! The ADF test is one of the most well-known and accepted test for stationarity. There are several packages that will do this for you, however, below, I am focusing on the ADF test within the package aTSA.
Quotes.ts<-ts(Quotes$Quotes,start=2002,frequency=12)
autoplot(Quotes.ts)+labs(title="Time Series of Daily Stock quotes", x="Time", y="Quotes")
The following code produces output similar to the output seen in SAS (under the tau test).
# Perform the ADF test (k=0)
aTSA::adf.test(Quotes.ts)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.3061 0.550
## [2,] 1 -0.5980 0.458
## [3,] 2 -0.0632 0.620
## [4,] 3 -0.0950 0.611
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -2.66 0.0939
## [2,] 1 -3.42 0.0192
## [3,] 2 -2.45 0.1608
## [4,] 3 -2.36 0.1943
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -2.62 0.3212
## [2,] 1 -3.36 0.0772
## [3,] 2 -2.41 0.4012
## [4,] 3 -2.29 0.4463
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
The Acf and the Pacf in R will calculate the autocorrelation (up to the lag you specify) and the partial autocorrelation, respectively. You can either output these values to look at them or plot them (see code below).
acf1=Acf(Y, lag=10)$acf
pacf1=Pacf(Y, lag=10)$acf
index1=seq(1,length(pacf1))
all.dat=data.frame(cbind(acf1[2:11],pacf1,index1))
colnames(all.dat)=c("acf","pacf","index")
ggplot(all.dat,aes(x=factor(index),y=acf))+geom_col()+labs(x="Lags")
For residuals to exhibit white noise, they must “independent” and normally distributed with mean 0 and constant variance. You already know how to assess normality and constant variance, however, we need to focus on assessing “independent”. We can assess if there is significant dependence through the Ljung-Box test. The hypotheses being tested are
This should be assessed on a stationary time series. Looking at a stationary time series, going back 10 lags should be sufficient (this will be different when we get to seasonal models). We would like for all of the p-values (for lags 1-10) to be insignificant. However, keep in mind that sample size will matter when assessing significance.
White.LB <- rep(NA, 10)
for(i in 1:10){
White.LB[i] <- Box.test(Y, lag=i, type="Ljung-Box", fitdf = 0)$p.value
}
white.dat=data.frame(cbind(White.LB,index1))
colnames(white.dat)=c("pvalues","Lag")
ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Ljung-Box test p-values",x="Lags",y="p-values")+coord_cartesian(ylim = c(0, 0.025))
####Fit appropriate model
Y.ARIMA=Arima(Y,order=c(2,0,0))
White.LB <- rep(NA, 10)
for(i in 3:10){
White.LB[i] <- Box.test(Y.ARIMA$residuals, lag=i, type="Ljung-Box", fitdf = 2)$p.value
}
white.dat=data.frame(cbind(White.LB[3:10],index1[3:10]))
colnames(white.dat)=c("pvalues","Lag")
ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Ljung-Box test when there is white noise",x="Lags",y="p-values")
AutoRegressive (AR) models involve modeling the lags of Y. We can write an autoregressive model as
ggAcf(Y)
ggPacf(Y)
Y.ts <- ts(Y)
Y.ARIMA <- Arima(Y.ts, order=c(2,0,0))
ggAcf(Y.ARIMA$residuals)
ggPacf(Y.ARIMA$residuals)
Moving average (MA) models involve modeling the lags of the error. We can write a moving average model as
ggAcf(x)
ggPacf(x)
x.ts <- ts(x)
x.ARIMA <- Arima(x.ts, order=c(0,0,2))
summary(x.ARIMA)
## Series: x.ts
## ARIMA(0,0,2) with non-zero mean
##
## Coefficients:
## ma1 ma2 mean
## -0.2460 0.4772 0.0250
## s.e. 0.0857 0.0923 0.0567
##
## sigma^2 estimated as 0.2207: log likelihood=-65.1
## AIC=138.2 AICc=138.63 BIC=148.62
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008828966 0.4627151 0.3808289 74.99115 114.4434 0.5453401
## ACF1
## Training set -0.002299708
ggAcf(x.ARIMA$residuals)
ggPacf(x.ARIMA$residuals)
If the series is trending then it is NOT stationary. You will NEED to do something to the series in order to make it stationary!! You will either fit a linear regression line (then use the residuals to model dependencies) or take differences and use the differenced series to model dependencies. We will be using the Ebay stock data of “Daily High” (however, this data has missing values!!). If you do not impute missing values, it simply ignores the missing values (could create an isssue). I would recommend FIRST imputing those values BEFORE doing the ADF test (in this algorithm, missing values are imputed).
Daily.High <- ts(Ebay$DailyHigh)
###NOT appropriate since there are missing values!!
aTSA::adf.test(Daily.High)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 0.410 0.762
## [2,] 1 0.157 0.689
## [3,] 2 0.254 0.717
## [4,] 3 0.327 0.738
## [5,] 4 0.396 0.758
## [6,] 5 0.431 0.768
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.79 0.409
## [2,] 1 -1.99 0.331
## [3,] 2 -1.90 0.366
## [4,] 3 -1.89 0.372
## [5,] 4 -1.88 0.375
## [6,] 5 -1.91 0.365
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -1.74 0.687
## [2,] 1 -2.09 0.538
## [3,] 2 -1.94 0.600
## [4,] 3 -1.87 0.632
## [5,] 4 -1.81 0.657
## [6,] 5 -1.80 0.662
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
ggplot_na_distribution(Daily.High)+labs(y="Stock prices for Ebay")
# Interpolate the missing observations in this data set
Daily.High<-Daily.High %>% na_interpolation(option = "spline")
autoplot(Daily.High)+labs(title="Daily high stock quotes",x="Time",y="Quotes")
# Perform an ADF test
aTSA::adf.test(Daily.High)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 0.414 0.763
## [2,] 1 0.148 0.687
## [3,] 2 0.266 0.720
## [4,] 3 0.330 0.739
## [5,] 4 0.385 0.755
## [6,] 5 0.434 0.769
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -1.80 0.405
## [2,] 1 -2.01 0.324
## [3,] 2 -1.91 0.364
## [4,] 3 -1.90 0.367
## [5,] 4 -1.90 0.368
## [6,] 5 -1.92 0.361
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -1.76 0.679
## [2,] 1 -2.13 0.520
## [3,] 2 -1.96 0.595
## [4,] 3 -1.89 0.622
## [5,] 4 -1.84 0.642
## [6,] 5 -1.82 0.653
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
### Definitely a Random Walk!!
How do we fit each situation? If the series is stationary about the line, we need to fit a line (and then model the AR and MA terms on the residuals). If series is a random walk with drift, then need to take differences. Here is the R code for each situation.
###Fitting a regression line...
time.high=seq(1,length(Daily.High))
ARIMA.line=Arima(Daily.High,order=c(0,0,0),xreg=time.high)
summary(ARIMA.line)
## Series: Daily.High
## Regression with ARIMA(0,0,0) errors
##
## Coefficients:
## intercept xreg
## 58.1845 0.3942
## s.e. 4.3071 0.0234
##
## sigma^2 estimated as 1477: log likelihood=-1610.58
## AIC=3227.17 AICc=3227.24 BIC=3238.45
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 4.157949e-12 38.31299 31.05748 -31.93382 49.92292 6.324794
## ACF1
## Training set 0.9819061
#### Now model residuals with AR and MA terms...can also send residuals through an automatic procedure to help!
####Fitting a random walk with drift
ARIMA.RW=Arima(Daily.High,order=c(0,1,0))
summary(ARIMA.RW)
## Series: Daily.High
## ARIMA(0,1,0)
##
## sigma^2 estimated as 47.76: log likelihood=-1062.6
## AIC=2127.21 AICc=2127.22 BIC=2130.96
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.4779789 6.900166 4.895049 0.4782673 4.541625 0.9968669 0.2026237
#### automatic procedure will determine if you need differencing or not!
We can use an automatic procedure to help us find a model. We will be using the mean of the maximum velocity in the hurricane data set. This data also has some missing values which we need to look into first.
max.velocity=hurricane$MeanVMax
ggplot_na_distribution(max.velocity)+labs(y="Mean Max Velocity")
This is yearly data and the reason those values are missing is because there were no hurricanes recorded for that year. Since there is no trend (nor seasonality), I am going to remove these NA values and then run the Dickey-Fuller test.
max.velocity=na.omit(max.velocity)
hurrican.ts=ts(max.velocity)
aTSA::adf.test(hurrican.ts)
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.8242 0.384
## [2,] 1 -0.4391 0.517
## [3,] 2 -0.2585 0.569
## [4,] 3 -0.1254 0.607
## [5,] 4 -0.0692 0.623
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -10.69 0.01
## [2,] 1 -7.69 0.01
## [3,] 2 -5.09 0.01
## [4,] 3 -4.09 0.01
## [5,] 4 -3.62 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -11.37 0.01
## [2,] 1 -8.37 0.01
## [3,] 2 -5.70 0.01
## [4,] 3 -4.67 0.01
## [5,] 4 -4.23 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
Series is stationary!! Let’s see if there is any significant dependecies here…
index1=seq(1,10)
White.LB <- rep(NA, 10)
for(i in 1:10){
White.LB[i] <- Box.test(hurrican.ts, lag=i, type="Ljung-Box", fitdf = 0)$p.value
}
white.dat=data.frame(cbind(White.LB[1:10],index1[1:10]))
colnames(white.dat)=c("pvalues","Lag")
ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Hurricane data",x="Lags",y="p-values")
There is definitely something to be modeled here!! Let’s try an automated search first…
model1=auto.arima(hurrican.ts)
model2=auto.arima(hurrican.ts,d=0)
Let’s take a look at ACF and PACF plots and see how well we do manually..
ggAcf(hurrican.ts)
ggPacf(hurrican.ts)
Using the graphs and some trial and error, here was the model I chose…
model3=Arima(hurrican.ts,order=c(2,0,3))
summary(model3)
## Series: hurrican.ts
## ARIMA(2,0,3) with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 mean
## 0.7921 0.1100 -0.7257 -0.1803 0.1578 91.4046
## s.e. 0.4161 0.3958 0.4094 0.3583 0.0791 1.8812
##
## sigma^2 estimated as 94.76: log likelihood=-569.79
## AIC=1153.58 AICc=1154.34 BIC=1174.88
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.07215997 9.544078 7.471114 -1.024043 8.28476 0.7050813
## ACF1
## Training set -0.0003383851
Comparing the ACF and PACF plots for these models:
ggAcf(model1$residuals,lag.max = 10)
ggPacf(model1$residuals,lag.max = 10)
ggAcf(model2$residuals,lag.max = 10)
ggPacf(model2$residuals,lag.max = 10)
ggAcf(model3$residuals,lag.max = 10)
ggPacf(model3$residuals,lag.max = 10)
Let’s take a look at white noise for each model:
index1=seq(1,10)
White.LB <- rep(NA, 10)
for(i in 2:10){
White.LB[i] <- Box.test(model1$residuals, lag=i, type="Ljung-Box", fitdf = 1)$p.value
}
white.dat=data.frame(cbind(White.LB[2:10],index1[2:10]))
colnames(white.dat)=c("pvalues","Lag")
ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Model 1",x="Lags",y="p-values")
White.LB <- rep(NA, 10)
for(i in 3:10){
White.LB[i] <- Box.test(model2$residuals, lag=i, type="Ljung-Box", fitdf = 2)$p.value
}
white.dat=data.frame(cbind(White.LB[3:10],index1[3:10]))
colnames(white.dat)=c("pvalues","Lag")
ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Model 2",x="Lags",y="p-values")
White.LB <- rep(NA, 10)
for(i in 6:10){
White.LB[i] <- Box.test(model3$residuals, lag=i, type="Ljung-Box", fitdf = 5)$p.value
}
white.dat=data.frame(cbind(White.LB[6:10],index1[6:10]))
colnames(white.dat)=c("pvalues","Lag")
ggplot(white.dat,aes(x=factor(Lag),y=pvalues))+geom_col()+labs(title="Model 3",x="Lags",y="p-values")
ggplot(data =hurrican.ts, aes(x = model1$residuals)) +
geom_histogram() +
labs(title = 'Histogram of Residuals for Model 1', x = 'Residuals', y = 'Frequency')
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data =hurrican.ts, aes(x = model2$residuals)) +
geom_histogram() +
labs(title = 'Histogram of Residuals for Model 2', x = 'Residuals', y = 'Frequency')
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(data =hurrican.ts, aes(x = model3$residuals)) +
geom_histogram() +
labs(title = 'Histogram of Residuals for Model 3', x = 'Residuals', y = 'Frequency')
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
You can now forecast the data with these models:
forecast::forecast(model1, h = 10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 156 94.59774 82.06430 107.1312 75.42950 113.7660
## 157 94.59774 82.00783 107.1877 75.34314 113.8524
## 158 94.59774 81.95162 107.2439 75.25716 113.9383
## 159 94.59774 81.89565 107.2998 75.17157 114.0239
## 160 94.59774 81.83993 107.3556 75.08635 114.1091
## 161 94.59774 81.78445 107.4110 75.00151 114.1940
## 162 94.59774 81.72921 107.4663 74.91703 114.2785
## 163 94.59774 81.67421 107.5213 74.83291 114.3626
## 164 94.59774 81.61944 107.5760 74.74914 114.4463
## 165 94.59774 81.56490 107.6306 74.66573 114.5298
autoplot(forecast::forecast(model1, h = 10))
autoplot(forecast::forecast(model2, h = 10))
autoplot(forecast::forecast(model3, h = 10))
Seasonality is the component of the time series that represents the effects of the seasonal variation within the dataset. This seasonal effect can be thought of as repetitive behavior that occurs every S time periods. Here, S is the seasonal period that gets repeated every S units of time. Remember though that seasonal data is not stationary. This is because the series itself does not revert to an overall, constant mean.
Let’s explore the U.S. airline passenger data from 1990 to 2007.
First, we load the dataset into a time series object using the ts function. We use the start = option to define the starting point for our dataset, which is Janurary of 1990. The frequency = specifies the length of the seasonal period in our dataset. Our data is monthly with an annual seasonal pattern, so our frequency here is 12. From there we split our data into two pieces - training and testing - using the subset function. For the training set, we are subsetting the data to exclude the last 12 months. These last 12 months will be our test data set.
# Load the Data
Passenger <- ts(USAirlines$Passengers, start = 1990, frequency =12)
autoplot(Passenger) + labs(title="Time Series plot for Passengers", x="Date",y="Passengers")
# Create training set from overall Airlines Data
training <- subset(Passenger, end = length(Passenger)-12)
# Create test set from overall Airlines Data
test <- subset(Passenger, start = length(Passenger)-11)
Now, let’s explore our training data by looking at a time series decomposition. This time series decomposition breaks down the data into three pieces - season, trend, and error.
# Time Series Decomposition ...STL#
decomp_stl <- stl(Passenger, s.window = 7)
# Plot the individual components of the time series
plot(decomp_stl)
From the above plot we can see the annual season of our data as well as the upward trending pattern. We do have a shock to the system from the September 11 attacks. This impacted US airline travel for a number of months. We will have to account for this in our modeling as we go along. We can even see this shock in the error term of our decomposition as it resulted in large errors that were out of pattern of the rest of the data.
A great first place to start with any time series dataset is to build an exponential smoothing model. Exponential smoothing models are great first models as they are easy to build and relatively accurate. This allows us to set a baseline to try and beat with more sophisticated modeling approaches.
To build a Holt-Winters exponential smoothing model to account for both the trend and seasonality in our data, we use the hw function. With the seasonal = option we can specify whether our data has additive or multiplicative seasonality. The initial = option just specifies that the initial values of the HW model are optimized in the actual model building process as compared to just estimated with a simple calculation like the overall mean. The h = specifies that we want to forecast 12 time periods into the future on our training dataset.
# Fit Holt-Winters ESM (multiplicative seasonality) on training data
HWES.USAir.train <- hw(training, seasonal = "multiplicative", initial='optimal', h=12)
Now that we have our model, let’s plot the forecast as well as evaluate this forecast with our test dataset. The mean element in the ESM model object gives the 12 month forecast that we specified above. We can look at the difference between this forecast and our test dataset - what actually happened in those 12 months. From there we calculate the MAE and MAPE of this forecast. The autoplot function is used to visualize the forecast itself.
# Calculate prediction errors from forecast
HW.error <- test - HWES.USAir.train$mean
# Calculate prediction error statistics (MAE and MAPE)
HW.MAE <- mean(abs(HW.error))
HW.MAPE <- mean(abs(HW.error)/abs(test))*100
HW.MAE
## [1] 1134.58
HW.MAPE
## [1] 1.763593
autoplot(HWES.USAir.train)+
autolayer(fitted(HWES.USAir.train),series="Fitted")+ylab("Airlines Passengers")+ geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")
From the output above, we see that we have an MAE of 1134.58 and a MAPE of 1.76%. This gives us a good baseline to try and beat with further modeling approaches.
Exponential smoothing models do not require data to be stationary. However, ARIMA models do require stationary data for modeling. Since seasonal data is not stationary, we must account for this lack of stationarity in one of two broad categories of approaches - deterministic solutions and stochastic solutions.
Some examples of deterministic solutions to seasonality are seasonal dummy variables, Fourier transforms, and predictor variables. The stochastic solution to seasonality is taking a seasonal difference. Be careful, as the choice of these solutions will matter for modeling. This is why we need to evaluate which approach we need using seasonal unit root testing. Similar to trending data, unit root testing allows us to determine if differencing is the proper solution to make our seasonal data stationary. There are many tests for seasonal unit-root testing. Each test is a little different on what the null and alternative hypotheses are so we must be careful to know what we are looking it with results.
The nsdiffs function runs seasonal unit root tests and reports the number of seasonal differences that are needed for your dataset. If 0, that does not mean your data is stationary. It just means that your seasonality cannot be solved with differencing. If more than 0, then seasonal differencing is the solution to the seasonality problem of our data.
training %>% nsdiffs()
## [1] 1
cbind("Airlines Passengers" = training,
"Annual change in Passengers" = diff(training, 12)) %>%
autoplot(facets=TRUE) +
xlab("Time") + ylab("") +
ggtitle("Comparison of Differenced Data to Original")
We can see from the output above that we need one seasonal difference to account for the seasonality in our dataset. That means we need to solve our seasonality stochastically. The plot shows both the original data as well as the differenced data. This is not the end of our unit root testing however. There might still be a “regular” unit root in our data. Our data does not appear to be trending, but it still might contain a unity root. Again, we will want to test for whether our data requires an additional difference or whether it is stationary.
The ndiffs function applied to the differenced data will help with this. The diff function is used with a lag = 12 to take our seasonal difference.
training %>% diff(lag = 12) %>% ndiffs()
## [1] 0
From above we see that we require no more additional differencing. Since our data no longer appears to have a trend or season, we are ready to go to ARIMA modeling.
If our dataset required a deterministic solution instead of a stochastic one, how would we go about solving this? Again, our dataset does not need a deterministic solution, but let’s explore how we would solve seasonality deterministically for completion sake.
Unlike trend, there are many different approaches to accounting for seasonality deterministically. The approaches we will discuss are seasonal dummy variables, Fourier transforms, and predictor variables.
Seasonal dummy variables are exactly what their name implies. We will use a set of dummy variables to account for the seasonal effects in our dataset. For a time series with S periods within a season, there will be S-1 dummy variables - one for each period (and one accounted for with the intercept). Since our dataset is monthly, we will build a dummy variable for each month and pick one to not include. For example, we could build the following model:
There are a lot of ways to build month variables in a dataset. You could extract month from a time stamp variable for example. Here, we create our own month variable by repeating the values 1 to 12 across our dataset using the rep function. We then convert this to a factor with the factor function as well as set the reference level using the relevel function.
# Seasonal Dummy Variables
Month <- rep(0, length(training))
Month <- Month + 1:12
## Warning in Month + 1:12: longer object length is not a multiple of shorter
## object length
M <- factor(Month)
M <- relevel(M, ref="12")
By using an lm function we can see how building a linear regression with this factor variable would look. This is the exact same thing that is going on with the modeling in our data. If we were to use this factor variable in an ARIMA modeling function like auto.arima, the function will first build a linear regression and then evaluate which ARIMA model we need on the residuals from our model. Essentially, this “removes” the impact of the season from our dataset. The auto.arima function’s xreg = option is how we incorporate any external variables (here our dummy variables) into the model. The seasonal = FALSE option tells the function to not try and account for season itself since we are accounting for it with seasonal dummy variables.
Season.Lin <- lm(training ~ M)
summary(Season.Lin)
##
## Call:
## lm(formula = training ~ M)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16257.2 -6848.9 484.4 6368.3 14699.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48761.5 2002.3 24.353 <2e-16 ***
## M1 -4461.9 2792.1 -1.598 0.1116
## M2 -5433.9 2792.1 -1.946 0.0531 .
## M3 4099.8 2792.1 1.468 0.1436
## M4 814.9 2831.7 0.288 0.7738
## M5 1951.8 2831.7 0.689 0.4915
## M6 4844.6 2831.7 1.711 0.0887 .
## M7 7504.9 2831.7 2.650 0.0087 **
## M8 7297.4 2831.7 2.577 0.0107 *
## M9 -3242.5 2831.7 -1.145 0.2536
## M10 1064.1 2831.7 0.376 0.7075
## M11 -1268.2 2831.7 -0.448 0.6548
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8256 on 195 degrees of freedom
## Multiple R-squared: 0.2097, Adjusted R-squared: 0.1651
## F-statistic: 4.703 on 11 and 195 DF, p-value: 2.227e-06
M.Matrix <- model.matrix(~M)
Trend <- 1:length(training)
SD.ARIMA <- auto.arima(training, xreg = M.Matrix[,2:12], method="ML", seasonal = FALSE)
summary(SD.ARIMA)
## Series: training
## Regression with ARIMA(1,1,1) errors
##
## Coefficients:
## ar1 ma1 drift M1 M2 M3 M4
## 0.4292 -0.7971 120.7148 -3947.9347 -5040.318 4373.0410 1776.4821
## s.e. 0.1142 0.0773 47.0832 485.1706 583.269 625.8909 653.3822
## M5 M6 M7 M8 M9 M10
## 2774.5010 5539.126 8075.0713 7745.7280 -2913.554 1275.9094
## s.e. 664.7169 667.948 665.0425 655.1154 633.936 589.7366
## M11
## -1168.1437
## s.e. 486.9715
##
## sigma^2 estimated as 3751114: log likelihood=-1844.41
## AIC=3718.82 AICc=3721.34 BIC=3768.74
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -13.60989 1865.287 1119.089 -0.1822502 2.399032 0.4156601
## ACF1
## Training set -0.002860136
There are some advantages and disadvantages to the seasonal dummy variable approach. The advantages are that the model’s seasonal dummy variables have some nice interpretability. They tells us the average impact each seasonal component has on the target variable. It is also a rather straight-forward approach to implement. The main disadvantages however are that if your seasonal is especially long or complex, seasonal dummy variables are burdensome. Also, the constant effect of each season that is assumed may not be appropriate.
Harmonic regression using Fourier transforms is another approach to account for seasonality deterministically. Fourier showed that a series of sine and cosine terms of the right frequencies approximate periodic patterns in a data series. To do this, we add Fourier variables to a regression model to account for the seasonal pattern. The odd terms , etc. are accounted for with sine variables:
Here, we loop through the first 6 Fourier terms to account for seasonality. We do this by adding the fourier function to the auto.arima function through the xreg = option. All of this is contained within a loop where we gradually increase the number of Fourier terms in the regression. The K = option in the fourier function does this. We loop through the first 6 Fourier terms (half of the seasonal length) and record the training BIC for each model. This BIC is reported on a plot with the 6 different forecasts (one from each model). The gridExtra package’s grid.arrange function allows the plots to all be put together.
plots <- list()
for (i in seq(6)) {
fit <- auto.arima(training, xreg = fourier(training, K = i),
seasonal = FALSE, lambda = NULL)
plots[[i]] <- autoplot(forecast::forecast(fit,
xreg = fourier(training, K=i, h=12))) +
xlab(paste("K=",i," BIC=",round(fit$bic,2))) +
ylab("") + ylim(30000,80000)
}
gridExtra::grid.arrange(
plots[[1]],plots[[2]],plots[[3]],
plots[[4]],plots[[5]],plots[[6]], nrow=3)
From the above output we see that the best model had 6 Fourier terms. So we can build this model to account for the seasonality in our dataset.
F.ARIMA <- auto.arima(training, xreg = fourier(training, K = 6), seasonal = FALSE)
summary(F.ARIMA)
## Series: training
## Regression with ARIMA(1,1,1) errors
##
## Coefficients:
## ar1 ma1 drift S1-12 C1-12 S2-12 C2-12
## 0.4289 -0.7970 120.6974 -1232.3084 -4334.8049 313.9197 677.9491
## s.e. 0.1142 0.0773 47.1030 270.2688 270.3805 194.1284 193.6296
## S3-12 C3-12 S4-12 C4-12 S5-12 C5-12 C6-12
## -2561.1131 1291.3900 413.8895 208.6503 2314.3804 274.0798 341.8763
## s.e. 152.7962 153.1557 130.4249 130.6610 118.9888 119.5082 81.8796
##
## sigma^2 estimated as 3751119: log likelihood=-1844.41
## AIC=3718.82 AICc=3721.34 BIC=3768.74
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -13.55273 1865.288 1119.117 -0.1820955 2.399079 0.4156704
## ACF1
## Training set -0.002801581
There are some advantages and disadvantages of the Fourier term approach to accounting for seasonality. The main advantage is that Fourier terms can handle long and complex seasonality. In fact, for multiple seasons in a dataset, we can add more Fourier variables at different frequencies to account for this. The disadvantages of the Fourier approach are that the Fourier terms themselves are not really interpretable and that we need to use trial and error to find the “right” amount of variables to use.
After removing the seasonality through deterministic solutions, the remaining error term (residuals) are modeled with seasonal ARIMA models. The key to these new seasonal ARIMA models is that there still might be effects at seasonal lags in the dataset, even though the main seasonality is accounted for.
The stochastic approach has been hinted at above. When a stochastic solution is best to solve seasonality, we need to take seasonal differences. A difference on a season is when we look at the difference between the current point and the same point in the previous season: . For our dataset, it can be thought of as the year over year change in our data.
Here the ggtsdisplay function allows us to see the actual difference along with its correlation pattern with ACF and PACF plots. Notice, we are looking at these ACF and PACF plots with regards to the differences on the training data using the diff function with lag = 12 specifying the length of the difference being the same as our season.
training %>% diff(lag = 12) %>% ggtsdisplay()
After we remove the seasonal effect through stochastic approaches, the remaining differences are modeled with seasonal ARIMA models. The key to these new seasonal ARIMA models is that there still might be effects at seasonal lags in the dataset, even though the main seasonality is accounted for.
There are some limitations to differencing. Differencing is hard to evaluate for long and complex seasons due to the statistical tests for stochastic differencing typically ended at a season length of 24. Therefore, long and/or complex seasons are typically best approached with deterministic solutions. In fact, it is hard to imagine a difference being very long in terms of time points logically. For example, if you had daily data and thought you had an annual season, it is hard to imagine there is an actual impact of Jan 26 from last year on Jan 26 of this year.
When extending the ARIMA model framework to the seasonal ARIMA framework, we add another set of terms - P, D, Q, and S. Notice how these terms are capitalized.
The terms represent the number of seasonal AR terms (), the number of seasonal MA terms (), and the number of seasonal differences (). The length of the season is still defined as . Seasonal ARIMA models have the same structure and approach as typical ARIMA models with AR and MA patterns in the ACF and PACF. THe main difference is that the pattern is just on the seasonal lag instead of the individual lags. For example, if you had an ) model, then the ACF would have an exponentially decreasing pattern every 12 lags while the PACF only had a single spike at lag 12. The pattern is the same, but it is only seen on the season since we are dealing with the seasonal lag. The opposite would be true for an ) model. The PACF would have an exponentially decreasing pattern every 12 lags while the ACF only had a single spike at lag 12.
For our data we can try the model using the Arima function. With either the ggtsdisplay function on the residuals, or the checkresiduals function on the model object, we can see that the model still doesn’t capture all of the signal and pattern in the dataset.
# Seasonal ARIMA
training %>%
Arima(order=c(1,0,0), seasonal=c(1,1,1)) %>%
residuals() %>% ggtsdisplay()
S.ARIMA <- Arima(training, order=c(1,0,0), seasonal=c(1,1,1))
summary(S.ARIMA)
## Series: training
## ARIMA(1,0,0)(1,1,1)[12]
##
## Coefficients:
## ar1 sar1 sma1
## 0.9056 0.0917 -0.672
## s.e. 0.0364 0.1091 0.093
##
## sigma^2 estimated as 4126436: log likelihood=-1763.94
## AIC=3535.88 AICc=3536.09 BIC=3548.97
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 338.6503 1956.379 1156.221 0.5565257 2.418163 0.4294517 -0.2622466
checkresiduals(S.ARIMA)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,0)(1,1,1)[12]
## Q* = 45.934, df = 21, p-value = 0.001304
##
## Model df: 3. Total lags used: 24
We can still use the auto.arima function to select the “best” starting point model for us as well. We can use the original data in this function along with the seasonal = TRUE option to allow the function to take the seasonal difference for us.
S.ARIMA <- auto.arima(training, method="ML", seasonal = TRUE)
summary(S.ARIMA)
## Series: training
## ARIMA(1,0,1)(0,1,1)[12] with drift
##
## Coefficients:
## ar1 ma1 sma1 drift
## 0.8800 -0.2962 -0.6785 124.9788
## s.e. 0.0454 0.0950 0.0600 23.6330
##
## sigma^2 estimated as 3639517: log likelihood=-1751.67
## AIC=3513.34 AICc=3513.66 BIC=3529.7
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -4.332616 1832.54 1055.07 -0.1745474 2.217472 0.3918815 0.01300462
checkresiduals(S.ARIMA)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(0,1,1)[12] with drift
## Q* = 21.957, df = 20, p-value = 0.3428
##
## Model df: 4. Total lags used: 24
Above we can see that the auto.arima function selected the model. By using the checkresiduals function we see that we are left with white noise after our modeling. We still have a large outlier that we will have to account for in the next section on dynamic regression.
We can use the forecast function with h = 12 to forecast the next 12 observations in our dataset. Similar to the exponential smoothing model above, we plot the forecast as well as evaluate the MAE and MAPE from this model.
forecast::forecast(S.ARIMA, h = 12)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2007 63889.23 61444.35 66334.12 60150.11 67628.36
## May 2007 65213.49 62382.40 68044.59 60883.71 69543.27
## Jun 2007 68462.20 65364.91 71559.48 63725.31 73199.09
## Jul 2007 71439.69 68151.01 74728.37 66410.09 76469.29
## Aug 2007 68975.83 65546.25 72405.40 63730.75 74220.90
## Sep 2007 57619.49 54084.66 61154.33 52213.43 63025.56
## Oct 2007 62959.30 59345.04 66573.56 57431.77 68486.83
## Nov 2007 61127.43 57452.84 64802.02 55507.63 66747.23
## Dec 2007 62364.86 58644.21 66085.50 56674.63 68055.09
## Jan 2008 58193.22 54437.30 61949.14 52449.04 63937.41
## Feb 2008 56375.46 52592.44 60158.48 50589.83 62161.08
## Mar 2008 68502.52 64698.65 72306.39 62685.00 74320.04
autoplot(forecast::forecast(S.ARIMA, h = 12)) + autolayer(fitted(S.ARIMA), series="Fitted") +
ylab("Airlines Passengers") +
geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")
# Calculate prediction errors from forecast
S.ARIMA.error <- test - forecast::forecast(S.ARIMA, h = 12)$mean
# Calculate prediction error statistics (MAE and MAPE)
S.ARIMA.MAE <- mean(abs(S.ARIMA.error))
S.ARIMA.MAPE <- mean(abs(S.ARIMA.error)/abs(test))*100
S.ARIMA.MAE
## [1] 1229.206
S.ARIMA.MAPE
## [1] 1.885399
From the above output we see that our seasonal ARIMA did not beat the Holt-Winters exponential smoothing model in terms of MAE or MAPE. That outlier might be impacting the estimation of our model, so we will have to address that.
The default for the Arima function in R (which is what auto.arima is built off of) uses the multiplicative approach to seasonal ARIMA models. To get an additive seasonal ARIMA model instead we need to use the fixed = option in the Arima function to specify exactly what terms we want to estimate instead as shown below. The zeroes are for terms we do not want to estimate, while the NA values will be estimated by the function.
# Additive Seasonal ARIMA
S.ARIMA <- Arima(training, order=c(1,0,13), seasonal=c(0,1,0),
fixed=c(NA,NA,0,0,0,0,0,0,0,0,0,0,NA,NA), method="ML",)
summary(S.ARIMA)
## Series: training
## ARIMA(1,0,13)(0,1,0)[12]
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8 ma9 ma10 ma11
## 0.9679 -0.3698 0 0 0 0 0 0 0 0 0 0
## s.e. 0.0237 0.0880 0 0 0 0 0 0 0 0 0 0
## ma12 ma13
## -0.6612 0.2490
## s.e. 0.0626 0.0766
##
## sigma^2 estimated as 3792723: log likelihood=-1755.53
## AIC=3521.07 AICc=3521.39 BIC=3537.43
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 216.2511 1870.713 1094.634 0.3140324 2.29433 0.4065768 0.02300017
Predictor variables are used for variety of things. Previously we have seen them used for accounting for trend and for seasonality. However, we can also use external variables to help make our forecasts better. Variables such as holiday effects, sales promotions, economic factors, and changes in policy are just a few examples.
Predictor variables are incorporated in a regression and the time series component is applied to the errors from this regression model:
An intervention variable is a variable that contains discrete values that flag the occurrence of an event affecting the response series. These variables are used to model and forecast the series itself or analyze the impact of the specific event. For example, We can measure the impact of a previous sales promotion and forecast a future sales promotion’s impact. We add these discrete variables in models to adjust the intercept of the model during the events.
The three most common types of intervention variables are:
A point intervention is typically denoted with a binary variable that flags when event occurs by taking a value of 1 with all other values set to zero. By putting this variable in our model, the coefficient on the intervention variable in the regression measures the estimated impact of that intervention.
A step intervention is typically denoted with a binary variable that flags when an event occurs as well as the time period that the effects of the event last. For example, if you have a change in policy, you would have 0 values for dates before the policy change and values of 1 for every date after the change in policy. By putting this variable in our model, the coefficient on the intervention variable in the regression measures the estimated impact of that intervention’s shift.
A ramp intervention is typically denoted by 0 values before an event and values that increase by 1 (1,2,3, etc.) starting with the event time point itself. By putting this variable in our model, the coefficient on the intervention variable in the regression measures the estimated slope of this new relationship after the event.
For our dataset we have a point intervention at September, 2001. We can easily create a binary variable that takes a value of 1 during that month and 0 otherwise. By using the xreg = option in the auto.arima function, we can add this intervention to our model as well as build out a seasonal ARIMA. We use the summary and checkresiduals functions to evaluate our model.
Sep11 <- rep(0, 207)
Sep11[141] <- 1
Full.ARIMA <- auto.arima(training, seasonal = TRUE, xreg = Sep11, method = "ML")
summary(Full.ARIMA)
## Series: training
## Regression with ARIMA(2,0,0)(0,1,1)[12] errors
##
## Coefficients:
## ar1 ar2 sma1 drift xreg
## 0.7119 0.1369 -0.6626 125.6161 -11539.935
## s.e. 0.0709 0.0713 0.0579 22.6471 1150.961
##
## sigma^2 estimated as 2442761: log likelihood=-1712.17
## AIC=3436.35 AICc=3436.8 BIC=3455.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.349235 1497.38 1030.214 -0.1299989 2.123044 0.3826493
## ACF1
## Training set 0.008315115
checkresiduals(Full.ARIMA)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0)(0,1,1)[12] errors
## Q* = 21.697, df = 19, p-value = 0.2996
##
## Model df: 5. Total lags used: 24
We can see from the output above that our model now accounts for the outlier that we previously saw in our data. Notice how our model has changed slightly after accounting for this outlier. This will improve the forecasts going forward, even though we do not have any outliers in our validation dataset.
To forecast these values going forward, we need future values of our predictor variable. For this dataset, we have our Sep11 variable that we forecast to have values of 0 for each of the 12 observations we are forecasting for airline passengers. To do this we create another variable (of the same name) that has future values. This new variable is then put into the xreg = option inside of the forecast function. Notice how this forecast function comes from the forecast package with the input being the model we previously built. The forecast function is expecting a variable in the xreg = option with the same name as in our model.
Sep11 <- rep(0, 12)
forecast::forecast(Full.ARIMA, xreg = Sep11, h = 12)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2007 63955.27 61952.29 65958.25 60891.98 67018.57
## May 2007 65316.30 62857.59 67775.01 61556.03 69076.57
## Jun 2007 68554.32 65778.00 71330.63 64308.31 72800.32
## Jul 2007 71525.80 68534.61 74516.99 66951.16 76100.43
## Aug 2007 68990.75 65846.48 72135.01 64182.01 73799.48
## Sep 2007 58181.07 54925.92 61436.22 53202.75 63159.39
## Oct 2007 63003.05 59666.52 66339.58 57900.27 68105.83
## Nov 2007 61187.86 57791.11 64584.60 55992.98 66382.73
## Dec 2007 62406.11 58964.54 65847.68 57142.69 67669.53
## Jan 2008 58243.83 54768.77 61718.89 52929.18 63558.48
## Feb 2008 56374.88 52874.71 59875.05 51021.83 61727.92
## Mar 2008 68567.34 65048.31 72086.37 63185.45 73949.23
autoplot(forecast::forecast(Full.ARIMA, xreg = Sep11, h = 12)) + autolayer(fitted(Full.ARIMA), series="Fitted") +
ylab("Airlines Passengers") +
geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")
We can plot our forecasts just as we have done previously.
Most forecasting models also need to account for explanatory variables such as price, advertising spending, or income. These kinds of models can be called any of the following: dynamic regression, ARIMAX, transfer functions. Through the examples above, we have seen how to implement this in R.
However, there are often lagged variables (lags of the predictor variables) as well as (or instead of) immediate impacts of these variables. In other words, previous values of the predictor variables may still play a role in the current prediction of the target variable. The question becomes, how many lags of a variable need to be included in the model. There are multiple ways to evaluate how many lags of a predictor variable you need in a model.
The first, more theoretical approach, involves cross correlations and pre-whitening of series. This approach is time consuming, requires building a model for the predictor variables themselves, and is therefore best used for small numbers of predictor variables.
The second approach evaluates many difference combinations of lags of the predictor variable in many different models. These models are compared on a metric like AIC/BIC on a validation dataset to see which models the data best. This approach is more efficient, handles many variables much easier, and is similar in accuracy to the first approach.
For our model, the impact of September 11, 2001 was felt for many months (lag impacts). In the code below we add six lags as well as one seasonal lag (the first anniversary of September 11, 2001) to the model through the same xreg = option in auto.arima.
# Intervention Analysis with Lags
Sep11 <- rep(0, 207)
Sep11[141] <- 1
Sep11.L1 <- rep(0, 207)
Sep11.L1[142] <- 1
Sep11.L2 <- rep(0, 207)
Sep11.L2[143] <- 1
Sep11.L3 <- rep(0, 207)
Sep11.L3[144] <- 1
Sep11.L4 <- rep(0, 207)
Sep11.L4[145] <- 1
Sep11.L5 <- rep(0, 207)
Sep11.L5[146] <- 1
Sep11.L6 <- rep(0, 207)
Sep11.L6[147] <- 1
Anniv <- rep(0, 207)
Anniv[153] <- 1
Full.ARIMA <- auto.arima(training, seasonal = TRUE, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), method = "ML")
summary(Full.ARIMA)
## Series: training
## Regression with ARIMA(2,0,0)(1,1,1)[12] errors
##
## Coefficients:
## ar1 ar2 sar1 sma1 drift Sep11 Sep11.L1
## 0.6298 0.2207 0.1926 -0.696 124.7562 -17400.420 -12116.115
## s.e. 0.0714 0.0726 0.1143 0.081 21.1622 1162.401 1271.324
## Sep11.L2 Sep11.L3 Sep11.L4 Sep11.L5 Sep11.L6 Anniv
## -8076.014 -7670.030 -4344.649 -2173.140 -749.6299 -2306.1784
## s.e. 1387.179 1427.366 1403.914 1271.271 1105.3247 998.2399
##
## sigma^2 estimated as 1736410: log likelihood=-1673.71
## AIC=3375.42 AICc=3377.75 BIC=3421.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 1.076103 1235.596 944.9564 -0.0820269 1.937634 0.3509825
## ACF1
## Training set 0.007704655
checkresiduals(Full.ARIMA)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,0,0)(1,1,1)[12] errors
## Q* = 16.046, df = 11, p-value = 0.1394
##
## Model df: 13. Total lags used: 24
From the output above, we see that the model produces white noise and does not appear to have any more intervention impacts in the dataset.
Forecasting in time series with only lagged values of the target variable is easy because of the recursive nature of the function - one prediction feeds into the next prediction. Forecasting with external variables is much trickier. For intervention variables it is easier since we should know future holiday, promotion, etc. values. However, some variables need forecasting. We could either use external estimate of future values or we might need to forecast future values ourselves.
To forecast this series, we need to have future values of these input variables. Luckily, for our variables, these are just 0 values. We still need to use the forecast function from the forecast package. This function also has an xreg = option where we need future values of these predictor variables. Ideally, these future values have the same name as the original predictor variables for the forecast function to work best. We can plot these updated forecasts as well as calculate the MAE and MAPE.
Sep11 <- rep(0, 12)
Sep11.L1 <- rep(0, 12)
Sep11.L2 <- rep(0, 12)
Sep11.L3 <- rep(0, 12)
Sep11.L4 <- rep(0, 12)
Sep11.L5 <- rep(0, 12)
Sep11.L6 <- rep(0, 12)
Anniv <- rep(0, 12)
forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2007 63937.46 62248.72 65626.20 61354.76 66520.17
## May 2007 65317.01 63321.26 67312.76 62264.77 68369.25
## Jun 2007 68441.10 66189.44 70692.76 64997.48 71884.72
## Jul 2007 71239.57 68817.90 73661.25 67535.94 74943.20
## Aug 2007 68660.79 66113.06 71208.53 64764.37 72557.22
## Sep 2007 58377.67 55736.78 61018.56 54338.78 62416.56
## Oct 2007 63394.60 60683.61 66105.60 59248.49 67540.71
## Nov 2007 61550.88 58786.78 64314.98 57323.55 65778.20
## Dec 2007 62572.13 59767.54 65376.71 58282.89 66861.37
## Jan 2008 58447.60 55612.04 61283.17 54110.98 62784.23
## Feb 2008 56242.81 53383.45 59102.17 51869.80 60615.82
## Mar 2008 68562.05 65684.39 71439.72 64161.04 72963.06
autoplot(forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)) + autolayer(fitted(Full.ARIMA), series="Fitted") +
ylab("Airlines Passengers") +
geom_vline(xintercept = 2007.25,color="orange",linetype="dashed")
# Calculate prediction errors from forecast
Full.ARIMA.error <- test - forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)$mean
# Calculate prediction error statistics (MAE and MAPE)
Full.ARIMA.MAE <- mean(abs(Full.ARIMA.error))
Full.ARIMA.MAPE <- mean(abs(Full.ARIMA.error)/abs(test))*100
Full.ARIMA.MAE
## [1] 1180.991
Full.ARIMA.MAPE
## [1] 1.800255
From the above output we see that although the MAE and MAPE have improved from the ARIMA model without accounting for September 11, but still not better than the exponential smoothing model.
Prophet is a model framework introduced to the public by Facebook in 2018. Facebook uses this algorithm to forecast univariate time series by decomposing the it into pieces. This is similar to exponential smoothing, ETS, etc. In the Prophet model structure, the signal is broken down into three pieces - growth/trend, season, holiday.
The growth/trend component uses trend lines (time) as variables in the model. This trend is a piecewise trend that brakes the pattern into different pieces using knots to change the slope. The user of the algorithm can specify the knots, or the algorithm will try to automatically select them. This trend can also be a logarithmic trend that is similar in design to the dampened trend approach to exponential smoothing.
The seasonal component consists of Fourier variables to account for the seasonal pattern. The algorithm was originally designed for daily data with weekly and yearly seasonal effects. This can be expanded though to handle different types of data and seasons. The yearly season is set to 10 Fourier variables by default:
The last component is the holiday component that consists of binary dummy variables to flag holidays in the dataset.
The prophet package contains all the functions needed for the Prophet algorithm, but it is in a different format than the previous R time series functions. To start, we define our own “holidays” in the dataset. This list is a list of dates that you want flagged with binary variables. For our dataset, this would be September, 2001 as well as its 6 lags and the one seasonal lag (anniversary). These must be in a data frame (using the data.frame function) as date objects. The as.Date function will do this. This data frame needs to have specific naming conventions. The variable name is called holiday while the date variable is called ds. Next, the training data needs a time variable also called ds.
Once the dataset is structured and ready, the prophet function can be employed. If we didn’t have our own “holidays” to add to the algorithm, we would just use prophet() to activate the prophet model framework. However, we use the holidays = holidays option to input our own additional holidays. The add_country_holidays function is used to add pre-programmed holidays by country. We use the US option to specify US holidays. Next, we use the add_seasonlity function to add our own monthly seasonality. The name = 'monthly' option along with the period = 30.5 defines the monthly season with a period of 30.5 days per season. We can also specify the number of Fourier variables with the fourier.order option. Now that we have the structure created, we fit the algorithm using the fit.prophetfunction on our model object Prof and dataset prophet.data.
# Prophet
holidays <- data.frame(
holiday = 'Sep11',
ds = as.Date(c('2001-09-01', '2001-10-01', '2001-11-01',
'2001-12-01', '2002-01-01', '2002-02-01',
'2002-09-01')),
lower_window = 0,
upper_window = 1
)
prophet.data <- data.frame(ds = seq(as.Date('1990-01-01'), as.Date('2007-03-01'), by = 'm'), y = training)
Prof <- prophet(holidays = holidays)
Prof <- add_country_holidays(Prof, "US")
Prof <- add_seasonality(Prof, name='monthly', period=30.5, fourier.order=6)
Prof <- fit.prophet(Prof, prophet.data)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
Now that the model is created, we can forecast our dataset with the make_future_dataframe function with the periods = 12 and freq = 'month' options to specify that we want 12 months of forecasts. This function structures the dataset (inputs) needed to forecast. The predict function actually does the forecasting. We can plot this forecast with the plot function and calculate our test dataset MAE and MAPE as well.
forecast.data <- make_future_dataframe(Prof, periods = 12, freq = 'month')
predict(Prof, forecast.data)$yhat
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## [1] 34243.76 32495.22 38101.67 36946.97 36975.28 40873.20 42424.04 43324.30
## [9] 35560.23 37836.91 36621.77 36764.24 33790.00 34062.55 38204.16 36368.54
## [17] 37427.79 40037.22 42637.44 42478.52 34665.00 37211.01 35306.58 36104.98
## [25] 32147.97 33338.65 42186.88 39335.63 40361.73 43754.04 46299.78 46256.07
## [33] 36883.19 40290.49 38767.65 39313.00 36637.89 35733.79 43993.84 40868.45
## [41] 42549.45 45082.83 48279.29 47592.35 37742.47 41910.31 39252.29 40910.26
## [49] 36793.29 35568.06 45705.08 43253.50 44616.84 47217.88 50103.76 49707.14
## [57] 40298.61 44257.91 41398.31 43223.39 38703.98 37464.74 46760.78 44882.33
## [65] 46022.57 48589.17 51270.38 51068.62 42611.82 45839.27 43291.56 44771.40
## [73] 40270.39 40037.01 50918.45 47996.47 49076.57 52398.14 54997.88 54883.44
## [81] 44839.38 48901.12 46694.19 47881.48 44621.57 42280.50 52169.55 49182.42
## [89] 50670.29 53335.46 56338.80 55783.64 45154.18 50040.03 47577.52 48978.19
## [97] 45156.01 43737.86 53833.48 51117.84 52683.43 55020.42 58108.55 57447.88
## [105] 47925.97 51936.84 48963.78 50840.35 46207.48 45099.70 54158.09 52680.33
## [113] 53357.91 56323.24 58539.88 58736.86 50175.26 53441.67 50789.16 52307.96
## [121] 47702.15 47230.55 57978.58 55483.21 56125.73 59873.92 62036.09 62348.25
## [129] 52227.29 56354.98 54071.15 55324.39 51987.57 49408.29 56523.72 55091.60
## [137] 55019.95 59240.13 60683.94 61683.80 42578.61 45401.09 44022.57 44335.20
## [145] 41597.00 40608.98 55731.99 53909.94 54577.90 57808.47 59998.97 60231.88
## [153] 41707.48 54716.79 53276.13 53616.42 50515.95 50963.50 57578.32 54751.95
## [161] 56774.26 58392.94 61956.21 60806.54 51070.73 55511.34 52659.94 54377.62
## [169] 49572.92 49996.96 61394.87 57552.87 59542.02 61943.58 65452.38 64417.91
## [177] 54098.06 58424.64 55941.92 57394.05 53858.34 52174.70 64058.38 60577.75
## [185] 62554.62 64726.27 68218.61 67169.95 56529.70 61422.04 57973.67 60356.15
## [193] 55548.10 53213.44 64919.11 62762.80 63765.02 66661.34 69186.09 69084.74
## [201] 59776.95 63569.66 60810.80 62469.29 58050.62 56449.65 67239.55 63788.43
## [209] 66435.48 67429.41 71617.44 69843.01 61233.17 64547.81 61847.06 63414.09
## [217] 58760.04 58849.83 71056.10
plot(Prof, predict(Prof, forecast.data))
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
# Calculate prediction errors from forecast
Prophet.error <- test - tail(predict(Prof, forecast.data)$yhat, 12)
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
# Calculate prediction error statistics (MAE and MAPE)
Prophet.MAE <- mean(abs(Prophet.error))
Prophet.MAPE <- mean(abs(Prophet.error)/abs(test))*100
Prophet.MAE
## [1] 1449.851
Prophet.MAPE
## [1] 2.252305
Above we see that the Prophet algorithm doesn’t perform as well as the previous algorithms we have tried. This should not be as surprising as the algorithm is just curve-fitting and does not use any lag values of the target variable. The forecast extends this curve into the future.
Neural network models are models based on mathematical models of how brains function. They are organized in a network of neurons through layers. The input variables are considered the neurons in the bottom layer. The output variable is the final neuron in the top layer. The layers in between, called hidden layers, transform the input variables through non-linear methods to try and best model the output variable.
From one layer to the next, the neurons from the previous layer are inputs into each neuron of the next layer all linear combined with weights (coefficients). Inside each of the hidden layer neurons, the linear combination of inputs is transformed with a non-linear function. Typically, this function is the logit function, but could be a variety of non-linear transformations. This is then output from each neuron to go into the next hidden layer. The top layer is just a linear combination of all of the transformations of the neurons from the previous hidden layer.
In autoregressive neural networks the bottom layer (input variables) also contains lags of the target variable as possible inputs. The number of lags to include is typically done through trial and error. We can approach this problem similar to ARIMA modeling with correlation plots and automatic selection techniques. However, there are no MA terms in this autoregressive neural networks. For seasonal data we typically include all lags up through one season unless correlation plots say you only need specific ones. We do however need to make our data stationary first since we are dealing with lagged terms of the target variable.
To build an autoregressive neural network in R, we use the nnetar function. This function cannot take differences of your data so we need to input our differenced training data. The p = and P = options specify the number of normal and seasonal autoregressive lags respectively. From there we just use the forecast function from the forecast package to forecast the next 12 time periods (h = 12 option).
set.seed(12345)
NN.Model <- nnetar(diff(training, 12), p = 2, P = 3)
NN.Forecast <- forecast::forecast(NN.Model, h = 12)
plot(NN.Forecast)
However, the nnetar function only forecasts the input. Since our input was the differences, the function forecasts the differences, not the original data. The for loop in the code just calculates the forecasts of the actual data based on these differences. Essentially, to get the first month’s forecast, we just add the first forecasted difference to the last data point in our training dataset. Lastly, we can calculate the MAE and MAPE from these forecasts.
Pass.Forecast <- rep(NA, 12)
for(i in 1:12){
Pass.Forecast[i] <- training[length(training) - 12 + i] + forecast::forecast(NN.Model, h = 24)$mean[i]
}
Pass.Forecast <- ts(Pass.Forecast, start = c(2007, 4), frequency = 12)
plot(training, main = "US Airline Passengers ARIMA Model Forecasts", xlab = "Date", ylab = "Passengers (Thousands)", xlim = c(1990, 2009), ylim = c(30000,80000))
lines(Pass.Forecast, col = "blue")
abline(v = 2007.25, col = "red", lty = "dashed")
# Calculate prediction errors from forecast
NN.error <- test - Pass.Forecast
# Calculate prediction error statistics (MAE and MAPE)
NN.MAE <- mean(abs(NN.error))
NN.MAPE <- mean(abs(NN.error)/abs(test))*100
NN.MAE
## [1] 1087.848
NN.MAPE
## [1] 1.667773
Finally, we have a model that beats the original exponential smoothing model’s forecasts! Before any problem begins, we do not know which algorithm will win so we try them all.
The thought process around weighted and combined forecasting (also called composite forecasting) is not new. The topic has been studied for years, and empirical evidence indicates that the combination of forecast methods tends to outperform most single forecast methods.
It is better to average forecasts in hope that by so doing, the biases among the methods and/or forecasters will compensate for one another. As a result, forecasts developed using different methods are expected to be more useful in cases of high uncertainty. The method is especially relevant for long-range forecasting, where uncertainty is extremely high. Many different studies have been done to show the value of combining forecasts.
There are two basic weighting methods:
The simple averaging approach is just as easy as it sounds. It takes the average of the forecasted values from each of the models being combined.
The code below just takes the average of the four models we have been working with for this dataset - the Holt-Winters exponential smoothing model, the dynamic regression ARIMA model, the neural network model, and Prophet algorithm model. We then calculate the MAE and MAPE from this model.
For.Avg <- (HWES.USAir.train$mean +
forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)$mean +
Pass.Forecast +
tail(predict(Prof, forecast.data)$yhat, 12))/4
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
Avg.error <- test - For.Avg
Avg.MAE <- mean(abs(Avg.error))
Avg.MAPE <- mean(abs(Avg.error)/abs(test))*100
Avg.MAE
## [1] 1046.492
Avg.MAPE
## [1] 1.586614
This average of the four models performs better than any of the individual models. However, maybe not all of the models are needed in this average. In fact, trying out many different combinations of models shows that only the exponential smoothing, neural network, and Prophet model are needed.
For.Avg <- (HWES.USAir.train$mean +
Pass.Forecast +
tail(predict(Prof, forecast.data)$yhat, 12)
)/3
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
Avg.error <- test - For.Avg
Avg.MAE <- mean(abs(Avg.error))
Avg.MAPE <- mean(abs(Avg.error)/abs(test))*100
Avg.MAE
## [1] 1001.659
Avg.MAPE
## [1] 1.5154
The MAE and MAPE are lower than any of the individual models as well as the previous average across all models.
The second method uses weighted averaging instead of simple averaging. In weighted averaging, the analysts select the weights to assign to the individual forecasts. Typically, we will assign weights that minimize the variance of the forecast errors over the forecasted period. With only two forecasts, there is an easily derived mathematical solution to what these weights should be to minimize variance. However, three or more forecasts being combined makes the mathematical equations much harder. In these scenarios we can use linear regression to help. We run a regression of the actual values of against the forecasted values of to let the regression choose the optimal weight for each. To make sure that the weights sum to one, we can use some algebra. By setting the weight (coefficients) of one of the models to one using the offset function, we can then use the I function to estimate the weights (coefficients) on the differences between the remaining models and the offset model. By working out the math algebraically, we can see that each model in an I function gets that weight, but the original model in the offset function gets the weight of 1 minus the remaining weights.
Pass.Fit.NN <- rep(NA, 207)
for(i in 25:207){
Pass.Fit.NN[i] <- training[i - 24] + NN.Model$fitted[i - 12]
}
Pass.Fit.ARIMA <- Full.ARIMA$fitted
Pass.Fit.HWES <- HWES.USAir.train$fitted
Pass.Fit.Prophet <- head(predict(Prof, forecast.data)$yhat, 207)
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
WC.Model <- lm(training ~ offset(Pass.Fit.HWES) + I(Pass.Fit.ARIMA - Pass.Fit.HWES) + I(Pass.Fit.NN - Pass.Fit.HWES) + I(Pass.Fit.Prophet - Pass.Fit.HWES) - 1)
summary(WC.Model)
##
## Call:
## lm(formula = training ~ offset(Pass.Fit.HWES) + I(Pass.Fit.ARIMA -
## Pass.Fit.HWES) + I(Pass.Fit.NN - Pass.Fit.HWES) + I(Pass.Fit.Prophet -
## Pass.Fit.HWES) - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3153.9 -741.1 -47.0 731.0 3826.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## I(Pass.Fit.ARIMA - Pass.Fit.HWES) 0.74970 0.06973 10.751 < 2e-16 ***
## I(Pass.Fit.NN - Pass.Fit.HWES) -0.02450 0.02483 -0.986 0.325
## I(Pass.Fit.Prophet - Pass.Fit.HWES) 0.28999 0.07251 3.999 9.79e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1178 on 156 degrees of freedom
## (48 observations deleted due to missingness)
## Multiple R-squared: 0.9995, Adjusted R-squared: 0.9995
## F-statistic: 1.097e+05 on 3 and 156 DF, p-value: < 2.2e-16
ARIMA.coef <- coef(WC.Model)[1]
NN.coef <- coef(WC.Model)[2]
Prophet.coef <- coef(WC.Model)[3]
HW.coef <- 1 - ARIMA.coef - NN.coef - Prophet.coef
Now we can just combine the forecasts with their respective weights to get our final forecast. From there we can evaluate the MAE and MAPE for this minimum variance weighted average forecast.
For.W.Avg <- HW.coef*HWES.USAir.train$mean +
ARIMA.coef*forecast::forecast(Full.ARIMA, xreg = cbind(Sep11, Sep11.L1, Sep11.L2, Sep11.L3, Sep11.L4, Sep11.L5, Sep11.L6, Anniv), h = 12)$mean +
NN.coef*Pass.Forecast +
Prophet.coef*tail(predict(Prof, forecast.data)$yhat, 12)
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
## Warning in make_holidays_df(year.list, m$country_holidays): Holidays for US are
## only supported from 1995 to 2044
W.Avg.error <- test - For.W.Avg
W.Avg.MAE <- mean(abs(W.Avg.error))
W.Avg.MAPE <- mean(abs(W.Avg.error)/abs(test))*100
W.Avg.MAE
## [1] 1068.435
W.Avg.MAPE
## [1] 1.605295
Unfortunately, just because we have the minimum variance on the training data, doesn’t mean that we are good on the test dataset. In fact, in practice, the regular average of forecasts is very hard to beat.